FIG. 1 



| COLLECTED WEB DOCUMENTS | 



| SYNTACTIC ANALYSIS \ 




I RESOLVE QUERY | | ORDER-RANKING \ 



INFORMATION REQUEST 



I INTERFACE | 



WEB DOCUMENT 



USER 



] 



mr 



FIG. 2 



C 



START 



| RECORD QUERY WORD GIVEN BY USER ^ S10 







QUERY WORD CONSTITUTED 
BY KEYWORD USED IN RECENT 
SEARCH AND FREQUENCY 






COMPUTE ENTROPY AMONG 
QUERY WORD, USER PROFILE AND 
KEYWORD OF WEB DOCUMENT 



-S20 



-S30 



S60 



ENSURE NUMBER OF 
DOCUMENTS BY USING 
BOOTSTRAP ALGORITHM 



NO IS DATA SUFFICIENT 

FOR LEARNING KOHONEN NEURAL 
NETWORK? 

YES 



DETERMINE WEIGHTS FOR INITIAL 
CONNECTION FOR BAYESIAN SOM 
UTILIZE BAYESIAN LEARNING 
DETERMINE PRIOR INFORMATION TO BE 
USED AS INITIAL VALUE FOR PARAMETER 
DETERMINED PRIOR INFORMATION 
PROBABILITY DISTRIBUTION NETWORK 
PARAMETER USE GAUSSIAN DISTRIBUTION 



PERFORM REAL-TIME DOCUMT 
CLUSTERING UTILIZING ENTROPY VALUE 
AND BAYESIAN SOM NEURAL NETWORK MODEL 
; BAYESIAN SOM=KOHONEN NEURAL- 
NETWORK + BAYESIAN LEARNING 
DETERMINE CLUSTERING VARIABLE 



-S40 



-S50 



-S70 



FIG. 4 



NOUN, POSTPOSI- 
TIONAL WORD, NON- 
TERM DICTIONARY 



MORPHOLOGY ANALYSIS 



SYNTAX TAGGING 



NOUN EXTRACTION 



CORPUS 
(KT set95) 



3L 



BUILD AIR INFORMATION 




FIG. 5A 




FIG. 5B 



FIG. 5C 




FIG. 5D 



FIG. 6 



Algorithm Cluster ingofOocs(UserQryProf i le[N], Ret_Docs[N]) 

// COMPUTE ENTROPY BY USING USER PROFILE AND KEYWORDS EXTRACTED 

FROM EACH DOCUMENT, AND PRODUCE DOCUMENT CLUSTER ACCORDING 

TO SIMILARITY 
Set i , j , k to 0 
for i = 1 to Numof Ret Docs 
for j = 1 to Numof Query 

for k = 1 to NumofTerms 

DocMatrix[i][j] = CalcEntropy(UserQryProf i le[j],Ret_Docs[k]); 
// COMPUTE P-NUMBER OF ENTROPY (KEYWORD, USER PROFILE), AND OBTAIN 

MATRIX HAVING SIZE OF N* P 

Call CalcSim(Return S imDoc [Numof RetDocs], DocMatrix[j+k]); 
// COMPUTE DISTANCE MATRIX HAVING SIZE OF N* N BETWEEN N-NUMBERS 
OF DOCUMENTS 
for i = 1 to NumOfRetDocs 

Call CreatCluster (Return DocC luster [Numof Cluster], SimDoc[i3); 
// FORM CLUSTER BASED ON DISTANCE MATRIX 
for j = 1 to NumOf Cluster 

Call Cal cSi m(User QryProf i I e[Numof C I uster ] , DocCluster [] ] ) 
// OBTAIN DEGREE OF SIMILARITY BETWEEN EACH CLUSTER AND QUERY WORD 

GIVEN BY USER, AND EACH CLUSTER AND USER PROFILE 
End CI uster ingof Docs 
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Algorithm RankofCluster(Ret_Oocs[N]} 
// DOCUMENT CLUSTER BY BAYES I AN SOM AND ORDER-RANKING ALGORITHM 
set i , j , k to 0; 
for i = 1 to k; 
for j = 1 to 3; 

lndex_Vector[k][j] = ExtrOf lndex(Ret_Docs[k]); 
Call Mutual Informat ion(User_Q[NumofQuery] , lndex_Vector[k][jl); 
DocEntropyVector [ i 3 [ j ] = Ca I cu I at eEn t r opy ( Ret_0ocs [k] ) ; 
end j; 
end i ; 

if NumofData <= 30 Call BootStrap(DocEntropyVector[i][j]); 

// PRODUCE SUFFICIENT DATA COLLECTION REQUIRED FOR LEARNING BAYES I AN 
BAYES I AN NEURAL NETWORK BY EMPLOYING STATISTICAL BOOTSTRAP ALGORITHM 
ALGORITHM IF DATA FOR LEARNING IS SMALL (FOR EXAMPLE, LESS THAN 30} 

Dec i s i onOf I n i t i a I We i ght ( ) ; 

// DETERMINE INITIAL EIGHTS FOR KOHONEN NETWORK BY UTILIZING PRIOR 
DISTRIBUTION OF BAYES I AN. THAT IS, AVERAGE IS ZERO, AND INERSE NUMBER 
OF SQUARE ROOT OF NUMBER OF NODES OF KOHONEN LAYER IS UTILIZED 
AS STANDARD DEVIATION 

Call BayesianSOMO; 

for i = 1 to NumOf Cluster; 

Ca I cu I at i onOf Norm(C I uster [NumOf C I uster ] ) ; 

end i; 

RankOf C I uster ( Va I ue_of Jorm[NumOf C I uster ] ) ; 
// RE-RANK DOCUMENT CLUSTER HAVING HIGH SIMILARITY TO QUERY WORD 
GIVEN BY USER 
End Rank_of_Cluster 



